Conditional Estimation of HMMs for Information Extraction

نویسندگان

  • Joseph Smarr
  • Huy Nguyen
  • Dan Klein
  • Christopher D. Manning
چکیده

The usual procedure of optimizing hidden Markov Models for data likelihood has undesirable consequences in information extraction: it focuses attention on the data rather than on the labeling task. Often, joint likelihood is poorly correlated with extraction F1. We demonstrate that optimizing the conditional likelihood of the target labels addresses these limitations and is more indicative of task performance. Comparing joint and conditional likelihood also helps to explain the empirical finding that, for IE, HMMs with fixed structures tend to outperform those with more flexible structures: fixed structures constrain EM to better optimize conditional likelihood.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-recall protein entity recognition using a dictionary

SUMMARY Protein name extraction is an important step in mining biological literature. We describe two new methods for this task: semiCRFs and dictionary HMMs. SemiCRFs are a recently-proposed extension to conditional random fields (CRFs) that enables more effective use of dictionary information as features. Dictionary HMMs are a technique in which a dictionary is converted to a large HMM that r...

متن کامل

Seminar Report Scalable Algorithms For Information Extraction

Information Extraction from unstructured sources like web is one of the interesting problems in machine learning. Part of Speech (PoS) tagging, segmentation of text, Named Entity Recognition (NER) are some of the applications of Information Extraction. There are many models like Hidden Markov Models (HMMs), Maximum Entropy Markov Models (MEMMs), Conditional Random Fields (CRFs) and Semi-Conditi...

متن کامل

Hidden Markov Models for Information Extraction

As compared to many other techniques used in natural language processing, hidden markov models (HMMs) are an extremely flexible tool and has been successfully applied to a wide variety of stochastic modeling tasks. This paper uses a machine learning approach to examine the effectiveness of HMMs on extracting information of varying levels of structure. A stochastic optimization procedure is used...

متن کامل

Information Extraction with HMMs and Shrinkage

Hidden Markov models (HMMs) are a powerful probabilistic tool for modeling time series data, and have been applied with success to many language-related tasks such as part of speech tagging, speech recognition, text segmentation and topic detection. This paper describes the application of HMMs to another language related task--information extraction--the problem of locating textual sub-segments...

متن کامل

Arabic Handwritten Word Recognition based on Bernoulli Mixture HMMs

This thesis presents new approaches in off-line Arabic Handwriting Recognition based on conventional Bernoulli Hidden Markov models. Until now, the off-line handwriting recognition, in particular, the Arabic handwriting recognition is still far away form being perfect. Hidden Markov Models (HMMs) are now widely used for off-line handwriting recognition in many languages and, in particular, in A...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003